Monotonically Improving Limit-Optimal Strategies in Finite-State Decision Processes
نویسندگان
چکیده
In every finite-state leavable gambling problem and in every finite-state Markov decision process with discounted, negative or positive reward criteria there exists a Markov strategy which is monotonically improving and optimal in the limit along every history. An example is given to show that for the positive and gambling cases such strategies cannot be constructed by simply switching to a "better" action or gamble at each successive return to a state. AMS Subject Classification (1980): primary 60G40, 90C40; secondary 90C39
منابع مشابه
Limit Synchronization in Markov Decision Processes
Markov decision processes (MDP) are finite-state systems with both strategic and probabilistic choices. After fixing a strategy, an MDP produces a sequence of probability distributions over states. The sequence is eventually synchronizing if the probability mass accumulates in a single state, possibly in the limit. Precisely, for 0 ≤ p ≤ 1 the sequence is p-synchronizing if a probability distri...
متن کاملOPTIMAL CONTROL OF AVERAGE REWARD MARKOV DECISION PROCESSES ' CONSTRAINED CONTINUOUS - TIME FINITE Eugene
The paper studies optimization of average-reward continuous-time finite state and action Markov Decision Processes with multiple criteria and constraints. Under the standard unichain assumption, we prove the existence of optimal K-switching strategies for feasible problems with K constraints. For switching randomized strategies, the decisions depend on the current state and the the time spent i...
متن کاملBlackwell Optimality in Markov Decision Processes with Partial Observation by Dinah Rosenberg,
A Blackwell ε-optimal strategy in a Markov Decision Process is a strategy that is ε-optimal for every discount factor sufficiently close to 1. We prove the existence of Blackwell ε-optimal strategies in finite Markov Decision Processes with partial observation. 1. Introduction. A well-known result by Blackwell [3] states that, in any Markov Decision Process (MDP hereafter) with finitely many st...
متن کاملSimplex Algorithm for Countable-State Discounted Markov Decision Processes
We consider discounted Markov Decision Processes (MDPs) with countably-infinite statespaces, finite action spaces, and unbounded rewards. Typical examples of such MDPs areinventory management and queueing control problems in which there is no specific limit on thesize of inventory or queue. Existing solution methods obtain a sequence of policies that convergesto optimality i...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Math. Oper. Res.
دوره 12 شماره
صفحات -
تاریخ انتشار 1987